SCIENTIFIC 

REPORTS 




OPEN 



SUBJECT AREAS: 

COMPARATIVE 
GENOMICS 

BACTERIAL GENOMICS 



Received 
1 7 September 201 3 

Accepted 
14 January 2014 

Published 
29 January 2014 



Correspondence and 
requests for materials 
should be addressed to 
P.X. (pingxu@sjtu.edu. 

cn) 



Genomic analysis of thermophilic Bacillus 
coogulons strains: efficient producers for 
platform bio-chemicals 



Fei Su & Ping Xu 



State Key Laboratory of Microbial Metabolism, and School of Life Sciences & Biotechnology, Shanghai Jiao Tong University, 
Shanghai 200240, P. R. China. 

Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed 
for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic 
information is available about this species. Here, we determined the genome sequences of five B. coagulans 
strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this 
species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was 
identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino 
acid metabolism may have played a significant role in the thermal adaptation. We also researched the 
immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile 
genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of 
B. coagulans and paves the way for improving and extending the uses of this species. 

White biotechnology, the clean industrial technology supported by several predominant political move- 
ments, will comprise no less than 20% of the chemical industry sales in the United States in 2020 1 . To 
attempt to meet the dramatic future demands for these materials, researchers have used microbial 
strains to produce platform bio-chemicals 2 . Microbial strains with the characteristics of robust high substrate 
efficiency, low by product formation, and excellent environmental tolerance are not easy to isolate from nature 3 . 
However, Bacillus coagulans is one such microorganism that has a number of these characteristics 4 7 . It is a spore- 
forming gram-positive soil bacterium, which can be found all over the world 8 . This species was first isolated in 
1915 by Hammer, who isolated this organism from spoiled canned milk 9 . Recently, B. coagulans has been 
reported to possess many valuable fermentation features, such as growth at 50°C-55°C and high carbon-effi- 
ciency 7 . In addition, it can ferment various biomass-derived sugars to yield various platform bio-chemicals, such 
as lactic acid. Moreover, the high fermentation temperature of B. coagulans strains enables non-sterilized batch 
and fed-batch fermentation for L-lactic acid production 4 . For example, Qin et al. used B. coagulans 2-6 to obtain a 
maximum L-lactic acid concentration of 182.0 g/liter with an optical purity of 99.4% at 50°C 4 . Milind et al. 7,10 and 
Wang et al. 6 reported that B. coagulans strains produce lactic acid, and that —98% of xylose could be converted to 
L-lactic acid. In addition to the production of lactic acid, B. coagulans has also shown to be a source of many other 
commercially valuable products, such as thermostable enzymes 8 , and coagulin, an antimicrobial peptide 11 . More 
recently, this species has also been regarded as a novel safe probiotic 12 . These studies suggest that B. coagulans 
strains can readily achieve generally regarded as safe (GRAS) status required for large-scale commercial use. 
Compared to other probiotic strains, such as those belonging to Lactobacillus species, B. coagulans strains are able 
to survive as spores in the extreme environments, such as high heat or acidity 12 . 

B. coagulans is one of the earliest isolated microorganisms, and it is thought to be an ideal industrial organism 
with remarkable advantages in manufacturing of various chemicals and enzymes 913 . However, little genetic 
information about this species is currently available, and there are still many questions to answer, such as why 
B. coagulans strains can produce high concentrations of optically pure L-lactic acid and why they can ferment 
openly without sterilization. We have recently determined the nucleotide sequences of B. coagulans strains 2-6, 
XZL4, XZL9, H-l and DSM1 14 ~ 17 . Comparative genomic analysis of these strains should provide us with com- 
prehensive insights into the metabolic characteristics of B. coagulans and its niche-specialized adaptation. 
Ultimately, we hope that this analysis will help answer the questions stated above and lead to new strategies 
for using and genetically improving these already useful strains. 
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Results 

Genome sequence and phylogenetic analysis. Table 1 shows the 
genomic features of all B. coagulans strains examined in this study. 
B. coagulans strains share many characteristics with those from the B. 
subtilis groups, including B. subtilis, B. licheniformis and B. amylo- 
liquefaciens. Their genomes have similar GC content (43% ~ 47%) 
and they grow well at a wide range of temperatures. However, the 
chromosome sizes off!, coagulans strains (~3 Mbp) are smaller than 
those of the B. subtilis group (~4 Mbp). The genomes of B. coagulans 
strains 36D 1 and XZL9 are significantly larger than those in the other 
strains. In contrast, the size of B. coagulans 2-6 genome in GenBank 
is smaller than all other Bacillus strains previously reported 16 . 
Comparison of the six B. coagulans genomes showed a high degree 
of sequence similarity and gene synteny in genome core regions 
(Figure 1). For a comprehensive comparative analysis, we concate- 
nated all conserved genes and constructed a phylogenetic tree 
(Figure 2). Unexpectedly, all B. coagulans strains, which are 
clustered together, are more closely related to B. cereus groups 
than to B. subtilis groups, which is consistent with the result of 
Mun et al. 18 . However, we did not find any gene related to the PlcR 
regulon, which is the main virulence system of B. cereus groups. 
Inside the B. coagulans strains, two strains (36D1 18 and XZL9) 
have nearly identical genome sizes ( — 3.5 Mbp), genomic context 
and gene orders, and have diverged from the other B. coagulans 
strains quite recently; strain 2-6 diverged from XZL4, H-l and 
DSM1. 

Central carbon metabolism. As industrial producers, B. coagulans 
strains have the ability to use a variety of carbohydrates. However, 
hexose and pentose are substrates that are involved in fermentation; 
therefore, these are the two sugars that we are most interested in. 
Based on our previous studies 4-7 , B. coagulans is a homofermentative 
Bacillus strain that has an efficient metabolic pathway for utilizing 
hexose. The primary product of glucose fermentation is L-lactic acid 
(ca. 97% of the fermentation products), and small amounts of acetate 
and succinate are also produced 4 . Our genomic analysis results 
indicate that the essential genes for the Embden-Meyerhof-Parnas 
(EMP) pathway are present, whereas those for the Entner-Doudoroff 
pathway are absent in B. coagulans. These suggest that most of the 
hexose goes through the EMP pathway, which is the most efficient 
pathway for converting hexose into lactic acid. Conversely, pentoses, 
such as xylose, usually go through two pathways, namely the 
phosphoketolase pathway (PKP) and pentose phosphate pathway 
(PPP) (Figure 3A). If xylose is metabolized through the PKP, the 
theoretical lactic acid yield is not expected to be higher than 60%. 
However, when xylose was provided as the carbon source for 
different B. coagulans strains, lactic acid represented 70% to 98% 
of the total fermentation products 6,7 . We found that in all strains 
except 36D1, only a fragment of the phosphoketolase gene was 
predicted. This gene, which encodes the enzyme that catalyzes the 
conversion of ribose-5-phosphate to 5-phospho-ribose-l -diphosphate, 
the first step of the PKP, is interrupted by transposases. Although 
strain 36D1 has a full-length predicted phosphoketolase gene, based 
on the result of 13 C-NMR experiments, the PPP is still the main 



metabolic pathway for xylose utilization in the strain 36D1 7 . It is 
not surprising that we found genes encoding the enzymes that 
catalyze the conversion of D-xylulose-5P to fructose-6P and 
glyceraldehyde-3P. 

Our analysis showed that in addition to the highly efficient EMP 
and PPP pathways, all the studied B. coagulans strains contain L- 
lactate dehydrogenase and produce L-lactic acid with highly optical 
purity (more than 99%) under facultative anaerobic conditions 4,19,20 . 
Meanwhile all the strains also contain D-lactate dehydrogenase (d- 
LDH) genes, even though no D-lactate dehydrogenase activity was 
detected in these strains 4 . The D-LDH encoding gene in some strains 
(2-6, XZL9, and XZL4) is interrupted by a premature stop codon. We 
compared the remaining D-LDHs with those from L. bulgaricus 21,22 . 
Some residues of D-specific lactate dehydrogenase that are essential 
for substrate specificity and catalysis have changed (Tyr52Leu, 
Asn77Thr, Val78Ala and Trpl35Val) 23 (Figure 4). Furthermore, 
we could not find any genes encoding pyruvate decarboxylase, which 
is a part of the pyruvate dehydrogenase complex. 

B. coagulans can also produce other bio-chemicals. For example, 
we found genes encoding proteins involved in the production of 
acetoin and butanediol, which are very important platform bio-che- 
micals 24 . According to our present experiments (Figure SI) and pre- 
vious studies 7,23 , ethanol, acetoin and butanediol are the primary 
fermentation products under aerobic conditions. 

Xylose metabolism. Previous studies showed that approximately 
half of B. coagulans species have the ability to metabolize D-xylose, 
which is the most important difference among the B. coagulans 
strains 25 . Genomic context analysis indicated that the xylose- 
utilization strains (36D1, XZL9 and XZL4) contain at least one 
copy of the xyl operon, which is similar to that in B. subtilis 26 . In 
the other B. coagulans strains, incomplete xyl operons, that lack the 
xylose H + -symporter (xy/T), were found, likely causing their 
inability to utilize D-xylose. In strains 36D1 and XZL9, there are 
also some other genes related to the xylose metabolism, such as 
xylose regulators, a xylose ABC transporter (xy/FGH), and a 
xyloside Na + (H + )-symporter (xynT). To obtain a full picture of 
xylose-utilization in B. coagulans, we compared the xyl operons in 
all B. coagulans strains (Figure 3B). 

Xylose isomerase (a synonym for glucose isomerase) is required 
for the first step of the xylose utilization, conversion of D-xylose into 
D-xylulose. B. coagulans is a known source of xylose isomerase for 
industrial production 27 . This enzyme was characterized in detail in L. 
lactis 2 ", and its orthologs are present in many bacteria. Using pan- 
genome analysis (Data SI), we identified two orthologous group of 
xylose isomerases, both of which belong to the xylose isomerase-like 
TIM barrel family (Pfam: PF01261). One of the orthologous group 
has homology to xylA from B. subtilis, with —85% similarity. We 
performed a phylogenetic analysis and identified the other group as 
an alternative xylose isomerase that we termed xylA-lll, which is not 
homologous to xylA from B. subtilis or to xylA-11 from Clostridium 
acetobutylicum 26 (Figure 3C). In the genomes of strains 36D1 and 
XZL9, xylA-lll is not clustered with any other xylose-utilization 
genes. Although they were not predicted to be in a genomic island, 



Table 1 Genomic Features of the 


Bacillus coagulans strains 










Feature 






Bacillus coagulans strains 






2-6 


36D1 


DSM1 


H-l 


XZL4 
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Figure 1 | Circular representation of the Bacillus coagulans 2-6 chromosome. The nine circles (from outside to inside) show the following: (i) the 
predicted ORFs on the plus and minus strands based on the COG database (colors were assigned according to the colors of the COG functional classes, 
which are listed on the bottom); (ii-vi) homology of B. coagulans2-6 CDSs identified using BLAST in the strains XZL4.XZL9.DSM1, H-l and 36D1 (red- 
to-blue were assigned according to the similarity of homologs); (vii) the genomic islands predicted by IslandViewer; (viii) the value of the GC skew (G — 
C/G + C); and (ix) the percentage of GC content with a 10-kb window size. 
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Figure 2 | Maximum likelihood tree of Bacillus strains. Genes that are conserved in all strains were aligned and concatenated for tree construction. The B. 
coagulans strains are highlighted in green. And strains from B. cereus group are highlighted in pink. A scale bar for the genetic distance is 
shown at the bottom. 



we suppose that xyZA-III is a part of a mobile genetic element that was 
obtained by horizontal gene transfer (HGT). In addition, we also 
found pseudo xy/A-III genes in XZL4, H-1, and DSM1, which 
resulted from a premature stop codon. These genes were likely inter- 
rupted during integration into the chromosome. Xylulokinase is 
required for the phosphorylation of D-xylulose, yielding D-xylu- 
lose-5-phosphate, a key intermediate in the PPP. Unlike xylose iso- 
merases, we found only one category of xylulokinase, which is very 
well conserved (>95% identity). The phylogenetic analysis showed 
that the second xylulokinase in strains 36D 1 and XZL9 likely resulted 
from gene duplication (Figure 3D). In addition, we found a fragment 
of a xyZA-III gene directly upstream of the second xylB in the strain 
36D1, which may have been generated during integration. 

Due to the lack of a xylose uptake system, B. subtilis is unable to 
grow on xylose as a sole carbon source 26 . In the B. coagulans strains, 
we identified three different types of xylose/xyloside transport sys- 
tems. The xylose H + -symporter (xyll), which belongs to the Major 
Facilitator Superfamily (MFS) of transporters family, is the last gene 
of the xyl operon. This gene, which is crucial for xylose update, has 
been reported in B. megaterium 29 and L. brevis 30 . Another gene, xynT, 
which also belongs to the MFS transporter family, imports xyloside 
across the membrane. However, we found no genes related to xylo- 
side utilization. The ABC-type xylose transporter xyZFGH was ori- 
ginally described in Escherichia coif 1 . The genes encoding this ABC 
transporter system are separated from other xylose-utilization gene, 
and there is an AraC homolog next to the ABC transporter, which is 
associated with the control of xylose uptake. 

Protein secretion systems. As a source for many thermostable 
industrial enzymes, the protein secretion systems are crucial for 
the B. coagulans strains. Two types of protein secretion, the Sec- 
and Tat-dependent secretion systems, were identified in the B. 
coagulans strains (Table S2). The protein secretion systems in B. 
coagulans strains are fully orthologous to those in B. subtilis 32 . 
Similar to B. subtilis, B. coagulans strains lack a secretion- specific 
targeting factor similar to the SecB protein of E. coli. However, in all 



B. coagulans strains, there are highly conserved signal recognition 
particle (SRP) pathways that play important roles in the 
translocation of pre-proteins. The SRP complex (Ffh), which acts 
as a cellular chaperone, binds to the signal peptide of an mRNA 
chain and is targeted to the membrane with the help of FtsY. The 
pre-protein translocation machinery of the Sec-dependent system 
consists of SecA, SecYEG, and SecDF, which are present in all B. 
coagulans strains. SecYEG functions as a membrane channel for 
protein export with the aid of SecA. However, we found that the 
secE gene was missing in the genome of strain 2-6. In the Tat- 
dependent secretion system, pre-proteins with twin-arginine signal 
peptides fold in the cytoplasm and are translocated by the Tat 
complex (TatAC) in the membrane 33 . At the latest stage, SPases 
remove the signal peptide from pre-proteins. We identified two 
types of SPases (type I and II) in the genomes of all B. coagulans 
strains. 

Natural competence. Natural competence is the ability of a cell to 
take up free DNA from the surrounding medium. To incorporate 
DNA from the medium, cells synthesize a specific DNA-binding and 
-uptake system to efficiently replace homologous regions of the 
chromosome, leading to a permanent change in cell phenotype. 
Currently, five different genes have been identified, which are 
essential for the DNA transport: comCEFG and nucA 34 . However, 
based on the result of orthologous analysis, we have identified four of 
these genes, comCEFG (Table S3). The comE operon encodes a 
polytopic transmembrane protein (ComEC), which is thought to 
form a pore that guides the DNA into the cell interior, where it 
may associate with the DNA-helicase-like protein encoded by 
comS. The comG-encoded protein takes up the DNA by using a 
pilin-like structure. ComC appears to be involved in the correct 
assembly of this structure 35 . In addition, we have also identified the 
transcriptional factor ComK in all strains, which regulates the 
expression of genes for DNA uptake and recombination in Bacilli. 
According to the research of Kovacs 36 , although the B. coagulans 
ComK recognized several elements similar to those of B. subtilis, 
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Figure 3 | Comparative analysis of xylose metabolism in the B. coagulans strains. (A) The metabolic pathway for xylose fermentation in lactic acid 
bacteria. (B) Schematic gene maps of the xylose-utilization genes found in the B. coagulans strains examined in this study. Genes, that were not filled, are 
pseudogenes. xylR: DNA-binding transcriptional activator; xylPd; xylose isomerase Type I; xylB: xylulokinase; xytT: Xylose H + -symporter; xylAIII: xylose 
isomerase Type III; PI: quinolinate synthetase; P2: iron-containing alcohol dehydrogenase; P3: peptidase; P4: hypothetical protein; P5: PfkB domain- 
containing protein; P6: LacI family transcriptional regulator; P7: beta-ketoacyl reductase; P8: hypothetical protein; P9: breakpoint of contigs; P10: 
hypothetical protein; PI 1: hypothetical protein; P12: 6-phospho-3-hexuloisomerase; (C) Maximum likelihood tree of xylA genes. The genes marked by 
filled circles ( • ) are from B. coagulans strains and are homologs of xylA in B. subtilis. The genes marked with squares ( ■ ) are from B. coagulans strains, 
and are the novel xylose isomerases discovered in this study. (D) Maximum likelihood tree of xylB genes. The genes marked with filled circle ( • ) are from 
B. coagulans strains. The accession numbers of these genes downloaded from the NCBI database are shown in the parentheses. A scale bar for the genetic 
distance is shown at the bottom. 



SCIENTIFIC REPORTS | 4:3926 | DOI: 10.1038/srep03926 



5 



pi 



\l 

1 



r| 1 al 

SLZSlSlSLQJISLQJLSUl 



P2 




QK 



LF[^jjL[AjR . KF 



QKF AR . KF 



AR . KF 



VE V 
LDI 
LDI 

LDI 





T 






T 






V 












I 






T12 

JJJJJLQ. 



a2 

JLQJLSLQJLQJLB. 



(14 




LDYT 

LDYT 

YPVG 

YPVGA 

YPVG 



GQFK 




a3 05 
SLQJUUUl — 

9 0 



a4 a5 

SLSLajLajLajLOJLajLajuLsia. slsjuulqjuui 



1 0 


0 






2DLD 


V 


p 




S P 


NA 


1J4A 


V 






SP 


N A 


36D1 


V 






S P 


YC 


DSM1 


V 






SP 


YC 


H-1 


V 






S P 


YC 




a6 |37 

16 0 170* 



a7 

SUUUUl 




a8 

SUULSL 

19 0 * 

JdsBSdB 
Jds J ■ dr 

Js . j ■ EM 
IS . J J EH 
IS . ■ » EM 



sum 

2 0 0 



2DLD 


Y 


KQ A 


DVI 


S L 


H 


V 


P 


1J4A 


Y 


KQA 


DVI 


SL 


H 


V 


P 


36D1 


L 


KES 


DVI 


TI 


H 


T 


P 


DSM1 


L 


KES 


DVI 


TI 


H 


T 


P 


H-1 


L 


EES 


DVI 


I 


H 


T 


P 




all 

tt SlSlOSJiSJUlSlSlSJLQJlSlQmSlSl 

Aoo 310 320 



2DLD 


T 


1J4A 




3 6D1 


T 


DSM1 


T 


H-1 


T 



PHTAFYjBBSAV 

phtafy^otSav 
phtafyBBav 
phtafyr^Hav 
phtafyBSHav 



V'VKMFN 
VVKHFD 
VEMHIiT 
A E MML T 
VEMQLT 



GE 

GKE 
TGK 
TGK 
TGK 



3 3 0 

DSPlVALlNKNKF 
AETPVKVG .... 

SRWE I KS 

SRWEIKV 

SRWEIKA 



Figure 4 | Multiple sequence alignment of D-lactate dehydrogenases. D-Lactate dehydrogenases were aligned by using ClustalX. 2DLD and 1J4A are the 
accession numbers for D-Lactate dehydrogenase from Lactobacillus helveticus and Lactobacillus bulgaricus, respectively, in the PDB. Visualization of the 
multiple sequence alignment was performed by ESPript. Secondary structure elements were calculated based on the structure of 1J4A. 
The residues that are marked in green are the key active sites that may affect the function. 



activation of the transcription of genes coding for DNA uptake in B. 
coagulans might differ from that of B. subtilis. 

Amino acid, cofactor, and vitamin biosynthesis. Nutrient require- 
ments are important in industrial microbial fermentations. We 
identified most of the amino acid biosynthetic pathways in the B. 
coagulans strains using the comparative pathway tool of PATRIC 
(Data S2). However, the synthetic pathways for L-histidine are 
incomplete in all sequenced strains. Histidine biosynthesis in B. 
subtilis is encoded by WsABCDEFGIJ 37 . The gene for histidinol- 
phosphatase (his], EC: 3.1.3.15), which catalyzes the dephosphory- 
lation of histidinol phosphate to histidinol, is absent from all the B. 
coagulans genomes. Moreover, there are no transport systems to 
import histidine through the membrane. In strain XZL4, we could 
not find the gene that encodes glutamate-ammonia ligase (EC: 
6.3.1.2), which converts L-glutamate to L-glutamine. Pathways for 
the synthesis of several cofactors, such as biotin, vitamin B6, and 
lipoic acid, are absent from all B. coagulans strains. However, 
according to the knowledge of KEGG, we identified at least one 
biotin transport system (bioY), through which the cells could 
obtain biotin from the surrounding medium 38 . The biosynthesis 
pathways for other cofactors, such as pantothenate, CoA, 
riboflavin, FAD and FMN, are present in all strains. 

Diversifying selection of genes associated with amino acid meta- 
bolism. Its thermophilic characteristic is a favorable fermentation 
feature of B. coagulans. According to Darwin, diversifying selection is 
the main driving force of evolution, in which the genes involved in 
environmental adaptation are usually under strong selection pres- 
sure 39 . Temperature, as a dominant selective pressure, could apply 
strong selection pressure on the genes that are important for thermal 
adaptation 40 . Therefore, calculating the positive selection pressure of 
each gene could help to identify the key genes that allow B. coagulans 
to survive at high temperatures 39,41 . Based on the genome- wide 



positive selection analysis, we found that there are a large number 
of genes that have significant evidence for positive selection in amino 
acid metabolism pathways (P-value < 0.01, Table 2. Detailed 
information is available at the following web site: http://202.120.45. 
186/~webserver/kaks/detail.php?jobId = s4ZLfBfSGJ). In addition, 
many of these genes are associated with stress resistance. For exam- 
ple, rocR encodes a 52-kDa polypeptide that belongs to the NtrC/ 
NifA family of transcriptional activators. It has been reported that a 
B. subtilis strain, which contains a rocR null mutation, is unable to use 
arginine as the sole nitrogen source, suggesting that RocR is a positive 
regulator of arginine catabolism 42 . RocR is also thought to be 
essential for nitrogen metabolism in response to various stresses 43 . 
Histidinol dehydrogenase (HisD), which catalyzes the last step in 
histidine biosynthesis, was reported as a virulence factor in the 
intracellular pathogen Brucella suis 44 . In B. subtilis, in response to 
diverse growth-limiting stresses, hisD expression is controlled by a B , 
which governs a large set of general stress proteins 45 . 

Restriction-modification and CRISPR-Cas systems. Fermentation 
failures due to bacteriophage attack result in substantial economic 
losses in fermentation industry 46 , especially during open fermen- 
2tation. Restriction-modification (R-M) and CRISPR-Cas are two 
types of general defense systems that protect cells from foreign 
DNA. The systems are compatible and act together to increase the 
overall phage resistance of the cells 47 . 

R-M systems are nearly universal and have been found in more 
than 90% of bacterial and archaeal genomes 48 . In general, within a 
cell, a methyltransferase protects host DNA by modifying a specific 
nucleic acid. The restriction endonuclease cleaves any foreign DNA 
that contains a specific recognition site, which is not protected by the 
modification 47 . By comparing the genome sequences to those in the 
REBASE database, we identified various R-M systems in the B. coa- 
gulans strains (Table 3), comprising approximately ~ 1% of the gen- 
ome. The majority of the B. coagulans R-M systems are Type I 
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Table 2 


Genes, which 


are under significant positive selection, from th 


= amino acid metabolism pathways 


Gene 


Gvalue 


Pathway 


Description 


rcoR 


U.UUUvJ 


Arginine Metabolism 


. 

Regulatory protein in arginine utilization 


KUl 


n nnnn 


Glycine, serine and threonine Metabolism 


Glycine C-acetyltransferase 




n nnnn 


U/cino nA atn t"\/"0 i c m 
LyolllC /VVfcJlU UUIIM 1 1 


rtbpui iuic-6ciTiiuiuciiyuc uci lym uyci lube 






Cysteine & Methionine Metabolism 








Glycine & Serine & Threonine Metabolism 




trmA 


0.0067 


Histidine Metabolism 


tRNA (uracil-5-)-methyltransferase 


hisD 


0.0142 


Histidine Metabolism 


Histidinol dehydrogenase 


fc/A 


0.0301 


Amino-acid transporter system 


Cystine ABC transporter 


mfnE 


0.0375 


Cysteine & Methionine Metabolism 


Transaminase 



(>50%). In these systems, cleavage occurs at variable distances from 
the recognition sequence. 

The CRISPR-Cas systems, which are comprised of clustered reg- 
ularly interspaced short palindromic repeats along with their assoc- 
iated (Cas) proteins, are hyper-variable genetic loci that are widely 
distributed in bacteria and archaea 49 . This defense mechanism 
requires immunity to against invading genetic elements. Because 
the phages that attack bacteria are abundant in soil habitats, many 
soil bacteria carry CRISPR sequences. CRISPR-Cas systems are com- 
monly found in the B. coagulans strains. In the genome of strain 2-6, 
we identified two different confirmed CRISPR loci; in strain 36D1, 
we found four different confirmed CRISPR loci (Table 4). However, 
only one CRISPR locus in each genome was associated with cas genes 
(CRISPR_2-6_2 and CRISPR_36D1_2), both of which include more 
than 40 spacers (Figure 5). We carefully compared the direct repeat 
sequences (DR) of the different CRISPR loci and found that the DR 
of each CRISPR locus has no more than 3 SNPs, which are highly 
conserved. In the remaining strains, we could not identify any com- 
plete CRISPR-Cas systems due to incompleteness of the genomes. 
However, there are cas genes in all of the draft genomes except for 
strain XZL4 (Table S4), implying that a complete CRISPR-Cas sys- 
tem may exist in XZL9, H-l, and DSM1. Based on a BLAST search of 
the GenBank database, we found that some CRISPR spacers have 
homology to sequences from different sources, including phage 
sequences (CRISPR_2-6_2_S9, CRISPR_2-6_2_S43, and CRISPR_ 
36dl_2_S10) and plasmids (CRISPR_36dl_2_S3 and CRISPR_ 
36dl_2_S10) (Data S3). Moreover, these two strains (2-6 and 
36D1) share some common spacers (>90% identity), whereas the 
other spacers have no homology in GenBank. However, some 
researches have recently suggested that yet unidentified spacers 
might mediate the interaction between CRISPR and the bacterio- 
phage or the environment 50 . The results of IslandViewer indicate 
that the CRISPRs-Cas systems are located in genomic islands in both 
strains (2-6 and 36D1; Table S5), and they are flanked by transpo- 
sases (Figure 5). The GC contents of the CRISPR-Cas systems are 
approximately 34.0% and 32.6%, whereas those of the entire gen- 
omes are 47.3% and 46.5%, in strains 2-6 and 36D1, respectively. In 
B. coagulans 2-6, six cas genes were identified downstream of small, 
host-encoded silencing RNAs, ca5l-6. cas3 encodes a large protein 



Table 3 | Number of genes in Restriction-Modification systems 
found in the 6. coagulans strains 



Strain 


Type l a 


Type 11° 


Type 111° 


Type Vl° 


2-6 


7 


1 


0 


2 


36D1 


19 


0 


2 


3 


XZL9 


9 


1 


0 


3 


XZL4 


10 


4 


0 


2 


H-l 


14 


3 


5 


2 


DSM1 


6 


1 


2 


3 



a : Restriction-Modification systems are classified based on the REBASE database. 



with separate helicase and DNase activities, which is an important 
characteristic for CRISPR-Cas classification 51 . According to the clas- 
sification of Makarova et al. 51 and cas genes found in different strains, 
the CRISPR-Cas systems found in the strains 36D1, DSM1 and XZL9 
may belong to the typical Type I CRISPR-Cas family, which contain 
the cas3 gene. However, those of strains 2-6 and H-l belong to an 
unclassified CRISPR-Cas family without a cas2 gene. 

Discussion 

As good platform chemical producers, B. coagulans strains have 
many of the necessary characteristics required to meet the needs of 
white biotechnology. High-throughput sequence technology and 
comparative genomic analysis have provided us with a full landscape 
of central carbon metabolism. In particular, highly efficient sugar 
metabolism pathways are the genetic foundation for high lactic acid 
yield. This may be because the genomes of B. coagulans strains have 
been shaped by the evolutional history. For example, to obtain a 
competitive advantage B. coagulans strains have designed a very 
efficient metabolism pathway (EMP and PPP) to produce high con- 
centrations of lactic acid from various substrates as a means to inhibit 
the growth of other microorganisms. These pathways, which pro- 
duce more by-products, have less selection pressure. In addition, the 
key enzymes in these pathways, such as phosphoketolase in the PKP, 
may easily lose their function. Besides producing lactic acid, B. coa- 
gulans strains also produce various other platform bio-chemicals, 
such as acetoin and butanediol. Considering their highly efficient 
sugar metabolism, if carbon flux is redirected towards the acetoin- 
butanediol pathway instead of the lactic acid pathway by knocking 
out L-lactate dehydrogenase, B. coagulans strains could become very 
good producers of these useful platform bio-chemicals from renew- 
able resources 20,23 . 

Genome-wide positive selection analysis led us to examine the 
relationship between amino acid metabolism and thermotolerance. 
In previous studies 52,53 , B. coagulans strains required additional 
nutrients, such as amino acids, to maintain their rapid growth during 
high-temperature fermentation. The research of Marshall and Beers 
showed that B. coagulans strains require different nutritional supple- 
ments at different temperatures 54 . Two cases could lead to such 
results: (i) the cell requires more nutrients to make up for proteins 
that are denatured by heat; (ii) the enzyme has less activity at higher 
temperature. These hypotheses need to be validated in further stud- 
ies. However, the genes, which are under strong positive selection 
pressure and play a key role in amino acid metabolism, may also be 
very important in thermal adaptation. The information above may 
provide some clues for the further studies aimed at reducing the 
requirements for expensive additional nutrients. 

The CRISPR-Cas and R-M systems have shown to work together 
to protect bacteria against invaders such as phage and plasmids 47 . 
These defense systems are a double-edged sword. On one hand, they 
keep foreign DNA from being incorporated into the cells. On the 
other hand, these systems also limit the genetic engineering of these 
strains for the production of other useful chemicals. Many researches 
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Table 4 | CRISPR-Cas sys 


ems found 


in the 8. coagulans strains 






CRISPR-Cas 


Strain 


Start 


End 


Repeat 


N urn of Spacer 


CRISPR 2-6 1 


2-6 


2,497,017 


2,497,370 


GTTTCAATTCCTTATAGGTAAAATA 


5 


CRISPR 2-6 2 


2-6 


2,499,794 


2,503,038 


ATTTAAATACATCCAATGTTAAAGTTCAAC 


49 


CRISPR 36D1 1 


36D1 


1,096,065 


1,097,943 


GTTTCAATTCCTCATAGGTAAAATACTAAC 


28 


CRISPR 36D1 2 


36D1 


2,1 17,433 


2,121,795 


GTTTGTATTTTACCTATGAGGAATTGAAAC 


65 


CRISPR 36D1 3 


36D1 


2,123,872 


2,124,703 


GTTTGTATTTTACCTATGAGGAATTGAAAC 


12 


CRISPR 36D1 4 


36D1 


2,126,172 


2,127,007 


GTTTGTATTTTACCTATGAGGAATTGAAAC 


12 



have tried to develop highly efficient general genetic engineering 
systems. For example, Rhee et al. 55 developed an electroporation 
method to transfer plasmid DNA into B. coagulans strains. In add- 
tion, they also constructed a B. coagulans/E. coli shuttle vector that 
contains the rep region from a native plasmid of B. coagulans strain 
P4-102B. Wang et al. 23 built a temperature sensitive plasmid to delete 
the native Idh and alsS (encoding acetolactate synthase) genes of 
strain P4-102B. The engineered bacteria can be used to produce 
either L- or D-lactic acid, respectively, at high titers and yields from 
nonfood carbohydrates. Kovacs et al. 13 and van Kranenburg et al. 19 
also developed a targeted gene disruption system using pSH71 repli- 
con. Moreover, in the research of Kovacs 13 , they have successfully 
applied the widely used Cre-lox system for genomic modifications 
and removal of selectable genes. However, highly efficient genetic 
tools of B. coagulans are still not currently available, which limits 
their potential as a next- generation production platform for building 
block chemicals or biofuels from renewable resources 13 . As men- 
tioned above, the CRISPR-Cas systems can keep foreign genetic 
material from B. coagulans strains. A full understanding of the 
diverse spacers in the CRISPR-Cas immune system could provide 
us with useful suggestions for modifying the current genetic tools to 
expand their host range 13 . R-M systems act to protect the strains 
against invading DNA 48 . Exogenous DNA with foreign methylation 
patterns are recognized and rapidly degraded 47 . Zhang et al. 48 
described a new pipeline that could potentially use as a universal 
genetic engineering tool, to overcome the problem of multiple RM 



systems. Another very important feature of a highly efficient genetic 
tool is a good plasmid origin. However, with the limited knowledge of 
B. coagulans, we could not find any high efficient plasmid origin. As 
more sequence data are obtained, new information and materials 
may be available to improve genetic engineering tools to meet the 
requirements of commercial applications. 

In summary, we examined the genomes of six B. coagulans strains 
in an attempt to explain its favorable fermentation features. Its rapid 
and efficient carbon metabolism may contribute to the efficient pro- 
duction of platform bio-chemicals. The ability to ferment at high 
temperature and their encoded immune systems could protect the 
B. coagulans strains from phage infection and contamination. It is 
suggested that these specific features could be attributed to utility of 
these B. coagulans strains as excellent industrial strains. 

Methods 

Genome sequencing. Four newly isolated Bacillus coagulans strains (2-6, H-1,XZL9, 
and XZL4) were identified by 16S rDNA sequencing, morphology, and physiological 
analysis. The genomic DNA of these four strains and the type strain of B. coagulans 
DSM1 from DSMZ were extracted using the Wizard Genomic DNA Purification Kit 
(Promega, USA). Whole-genomes of these five B. coagulans strains were sequenced 
by Chinese National Human Genome Center at Shanghai, China. The pair-end reads 
were assembled de novo using the program Velvet with manually optimized settings. 
The Phred/Phrap/Consed package was used to finish genomes. To fill the gaps among 
the de novo assembled contigs in B. coagulans 2-6, we followed the method of the 
reference-guided mapping 56 . The genomes of 2-6, XZL4, XZL9, DSM1 and H-l were 
submitted to the web service RAST for automatic annotation followed by manual 
checking. The annotation of these genomes can be publicly obtained at the RAST 
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Figure 5 | Overview of the CRISPR-Cas systems in B. coagulans strains 2-6 and 36D1. (A). Genetic map of the CRISPR-Cas systems detected in two B. 
coagulans strains (2-6 and 36D1). Cas genes were detected around the CRISPR loci. Different colors show the different CRISPR loci and cas genes: casl: 
light purple; cas2: red; cas3: dark red; cas4: purple; cas5: gold; cas6: pink; cas7: green; cas8: orange; CRISPR: blue; IS3: black; IS4: gray; XRE transcriptional 
regulator: green. (B). Overview of the five CRISPR loci in the two B. coagulans strains. The repeats are shown as gray rectangles and the spacers 
are shown as white diamonds. Spacers with similar sequences (>90% identity) in the studied genomes are shown as the same color. 
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website with a guest account. The genome sequence of B. coagulans 36D1 was 
downloaded from GenBank. We use the PATRIC, which is the NIAID/PathoSystems 
Resource Integration Center, and RAST for comparative genomic and metabolic 
pathways analysis. The IslandViewer was used to detect genomic islands (GIs) in the 
genomes. GIs that were predicted at least by one method (IslandPick, SIGI-HMM or 
IslandPath-DIMOB), were accepted. The CRISPR/Cas systems were identified with 
CRISPR Finder. The Restriction-Modification systems were predicted based on the 
data of REBASE 57 . The genomic context was visualized performed by using Circos. 

Phylogenetic analysis. Orthologous relationships between protein- coding sequences 
in the genomes were determined by using OrthoMCL, with the following criteria: 
identity > 50% and e-value < le-5. Single-copy orthologs common in all genomes 
were used to construct genome-scale phylogenetic tree. Briefly, individual orthologs 
were aligned by using MUSCLE, back translated to DNA sequences by using ad hoc 
Perl scripts similar to the strategy of PAL2NAL 58 , and concatenated to obtain a 
"chromosomal" alignment. The best fitting model of sequence evolution was 
determined using jModelTest2 with 11 substitution schemes. Model selection was 
computed using the Akaike information criterion (AIC). The phylogenetic tree was 
constructed with PHYML under GTR + gamma + I model according to the result of 
jModelTest2. 

Molecular evolutionary analysis. We performed a positive selection analysis with 
PSP 59 , which is a web tool designed for calculating the selection pressure across 
multiply closely related genomes (http://db-mml.sjtu.edu.cn/PSP/or http://202.120. 
45.186/~-webserver/kaks/). Using the branch-site strain-specific model, we analyzed 
the positive selection pressure across 26 Bacillus strains with B. coagulans as 
"foreground branches" (Table Si). To determine the level of significance for the 
LRTs, we calculated the P-value using a X 2 distribution, with the number of degrees of 
freedom corresponding to the difference of parameters between the nested models. 
We used the conservative BEB approach to calculate the posterior probabilities of a 
specific codon site and to identify those with higher probabilities for being under 
diversifying selection. 

Accession numbers. The genome sequences of B. coagulans 2-6, XZL4, XZL9, DSM 1 
and H-l were deposited in NCBI database under the accession number CP002472, 
AFWM00000000, ANAP00000000, ALAS01000000 and ANAQ00000000, 
respectively. 
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